Predicting the Unpredictable

July 12, 2021 | 2 min read

cover-image

Injuries can have a tremendous impact on the sports industry. It affects the mental and physical state of the injured player and reduces the overall performance of the team. It can also lead to financial consequences in the form of the cost associated with medical care, recovery and rehabilitation, and the loss of earnings gained from the popularity of the injured player. The consequences of sports injuries led researchers, coaches and managers to focus on injury forecasting.

Sports injuries are an outcome of the complex interplay of several factors. Due to the interactions of multiple intrinsic and extrinsic risk factors, a comprehensive model is necessary to achieve the challenging task of injury prediction. A significant drawback of the recent studies conducted in this regard is that they are mono-dimensional. They consider just one parameter at a time to forecast the risk of injuries' ignoring the complex patterns underlying the extracted data. Such models are practically ineffective due to their very low precision. In contrast, a multidimensional model that is entirely data-driven can offer much higher accuracy.

Several studies have proved the relationship between training workload and injury risk. A finding from the study of Gabbett et al. is that a player whose workloads are increased above certain thresholds faces a relatively higher injury risk.

The first step in injury forecasting is collecting a substantial amount of relevant data. Alessio Rossi (a sports science researcher) et al. monitored the physical activity of 26 Italian football players for 23 weeks. They initially collected data about personal features such as age, body mass index, height, role on the field etc., of players.

During training sessions, participants were supposed to wear a tight top equipped with GPS devices integrated with an accelerometer, a gyroscope and a 3D digital compass.

The researchers extracted 12 features (kinetic, metabolic and mechanical) which provided information about the training workload from the GPS data obtained. Another crucial feature considered was the number of previous injuries the players suffered up to that session.

Based on the GPS data features, the researchers constructed a training dataset consisting of 55 features and 952 examples. Each example refers to a single player's training session and consists of a vector of features describing both the player's features and his recent workload, including the current training session and an injury label, indicating whether or not the player gets injured in the next game or training session.

For their analysis, the researchers used a machine learning technique called decision tree classifiers. First, the collected data is split into two parts—training and test data. Next, the algorithm has to learn about the relationship between the outcome of interest (injury or not) and the contributing features from the training data. The test data to which the algorithm has not been exposed can then be used to test the prediction capacity of the algorithm.

Two important parameters that decide the quality of the forecasting model are precision and recall. Precision indicates how reliable the forecaster is: the higher the precision, the higher the player's probability of getting injured. The recall is the ratio of injuries the forecaster detected to the total number of injuries. A forecaster with higher precision is capable of predicting a higher fraction of injuries.

A comparison of this multidimensional approach's performance based on decision tree classifiers with other forecasters proves that the performance of the multidimensional model is appreciably higher.

Machine learning techniques such as deep neural networks can give even greater accuracy. The exact rules used by this technique are not known to the data scientist. Due to this reason, such models are not practical as they are not easy to interpret. In addition to knowing when a player might get hurt, coaches need to understand why they will get hurt. Hence the challenge faced by data scientists while developing injury forecasters is to achieve a balance between accuracy and interpretability.

picture

Fig 1. Interpretation of the multi-dimensional injury forecaster.
(a) The six injury rules extracted from DT. For each rule we show the range of values of every feature, its frequency (Freq) and accuracy (Acc). (b) A schematic visualization of decision tree. Black boxes are decision nodes, green boxes are leaf nodes for class No-Injury, red boxes are leaf nodes for class injury. Source

picture

Fig 2.
A football player wears a vest holding a GPS sensor. The data captured feed into an algorithm. Source


picture

Fig 3.
GPS data from sensors carried by players can help coaches to decide who is at risk of injury. Source


picture

Fig 4.
Description of the training workload features extracted from GPS data and the players’ personal features collected during the study. Four categories of features: kinematic features (blue), metabolic features (red), mechanical features (green) and personal features (white). Source


picture

Fig 5.
Construction of the training dataset and the forecasting model. Source